autonomous driving system
HotBEV: Hardware-oriented Transformer-based Multi-View 3D Detector for BEV Perception
The bird's-eye-view (BEV) perception plays a critical role in autonomous driving systems, involving the accurate and efficient detection and tracking of objects from a top-down perspective. To achieve real-time decision-making in self-driving scenarios, low-latency computation is essential. While recent approaches to BEV detection have focused on improving detection precision using Lift-Splat-Shoot (LSS)-based or transformer-based schemas, the substantial computational and memory burden of these approaches increases the risk of system crashes when multiple on-vehicle tasks run simultaneously. Unfortunately, there is a dearth of literature on efficient BEV detector paradigms, let alone achieving realistic speedups.Unlike existing works that focus on reducing computation costs, this paper focuses on developing an efficient model design that prioritizes actual on-device latency.To achieve this goal, we propose a latency-aware design methodology that considers key hardware properties, such as memory access cost and degree of parallelism.Given the prevalence of GPUs as the main computation platform for autonomous driving systems, we develop a theoretical latency prediction model and introduce efficient building operators.By leveraging these operators and following an effective local-to-global visual modeling process, we propose a hardware-oriented backbone that is also optimized for strong feature capturing and fusing.Using these insights, we present a new hardware-oriented framework for efficient yet accurate camera-view BEV detectors.Experiments show that HotBEV achieves a 2\%$\sim$23\% NDS gain, and 2\%$\sim$7.8\%
A Multi-Agent LLM Framework for Design Space Exploration in Autonomous Driving Systems
Shih, Po-An, Wang, Shao-Hua, Li, Yung-Che, Tu, Chia-Heng, Chang, Chih-Han
Designing autonomous driving systems requires efficient exploration of large hardware/software configuration spaces under diverse environmental conditions, e.g., with varying traffic, weather, and road layouts. Traditional design space exploration (DSE) approaches struggle with multi-modal execution outputs and complex performance trade-offs, and often require human involvement to assess correctness based on execution outputs. This paper presents a multi-agent, large language model (LLM)-based DSE framework, which integrates multi-modal reasoning with 3D simulation and profiling tools to automate the interpretation of execution outputs and guide the exploration of system designs. Specialized LLM agents are leveraged to handle user input interpretation, design point generation, execution orchestration, and analysis of both visual and textual execution outputs, which enables identification of potential bottlenecks without human intervention. A prototype implementation is developed and evaluated on a robotaxi case study (an SAE Level 4 autonomous driving application). Compared with a genetic algorithm baseline, the proposed framework identifies more Pareto-optimal, cost-efficient solutions with reduced navigation time under the same exploration budget. Experimental results also demonstrate the efficiency of the adoption of the LLM-based approach for DSE. We believe that this framework paves the way to the design automation of autonomous driving systems.
- Transportation > Ground > Road (1.00)
- Information Technology > Robotics & Automation (1.00)
- Automobiles & Trucks (1.00)
VP-AutoTest: A Virtual-Physical Fusion Autonomous Driving Testing Platform
Cui, Yiming, Fang, Shiyu, Zhang, Jiarui, Huang, Yan, Xu, Chengkai, Zhu, Bing, Zhang, Hao, Hang, Peng, Sun, Jian
The rapid development of autonomous vehicles has led to a surge in testing demand. Traditional testing methods, such as virtual simulation, closed-course, and public road testing, face several challenges, including unrealistic vehicle states, limited testing capabilities, and high costs. These issues have prompted increasing interest in virtual-physical fusion testing. However, despite its potential, virtual-physical fusion testing still faces challenges, such as limited element types, narrow testing scope, and fixed evaluation metrics. To address these challenges, we propose the Virtual-Physical Testing Platform for Autonomous Vehicles (VP-AutoTest), which integrates over ten types of virtual and physical elements, including vehicles, pedestrians, and roadside infrastructure, to replicate the diversity of real-world traffic participants. The platform also supports both single-vehicle interaction and multi-vehicle cooperation testing, employing adversarial testing and parallel deduction to accelerate fault detection and explore algorithmic limits, while OBU and Redis communication enable seamless vehicle-to-vehicle (V2V) and vehicle-to-infrastructure (V2I) cooperation across all levels of cooperative automation. Furthermore, VP-AutoTest incorporates a multidimensional evaluation framework and AI-driven expert systems to conduct comprehensive performance assessment and defect diagnosis. Finally, by comparing virtual-physical fusion test results with real-world experiments, the platform performs credibility self-evaluation to ensure both the fidelity and efficiency of autonomous driving testing. Please refer to the website for the full testing functionalities on the autonomous driving public service platform OnSite:https://www.onsite.com.cn.
- Asia > China > Shanghai > Shanghai (0.04)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- Transportation > Ground > Road (1.00)
- Information Technology (1.00)
RoboDriveVLM: A Novel Benchmark and Baseline towards Robust Vision-Language Models for Autonomous Driving
Liao, Dacheng, Qi, Mengshi, Shu, Peng, Zhang, Zhining, Lin, Yuxin, Liu, Liang, Ma, Huadong
Current Vision-Language Model (VLM)-based end-to-end autonomous driving systems often leverage large language models to generate driving decisions directly based on their understanding of the current scene. However, such systems introduce multiple risks in real-world driving scenarios. T o evaluate whether VLMs are truly viable for autonomous driving, we introduce RoboDriveBench, the first robustness benchmark focused on end-to-end trajectory prediction tasks. This benchmark systematically evaluates two critical categories of real-world challenges for VLM-based end-to-end autonomous driving systems through 11 simulated scenarios encompassing various corruption types, including 6 scenarios of sensor corruption caused by environmental variations, along with 5 cases of prompt corruption resulting from human intervention and data transmission failures. Each corruption type includes 250 unique driving scenarios and 5,689 frames, resulting in 64,559 total trajectory prediction cases per evaluation. T o overcome these real-world challenges, we propose a novel VLM-based autonomous driving framework called Robo-DriveVLM, which enhances robustness by mapping more multimodal data--e.g., lidar and radar--into a unified latent space. Furthermore, we introduce a new T est-Time Adaptation (TTA) method based on cross-modal knowledge distillation to improve the robustness of VLM-based autonomous driving systems. Through extensive experiments, our work highlights the limitations of current VLM-based end-to-end autonomous driving systems and provides a more reliable solution for real-world deployment. Source code and datasets will be released.
- Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)
- Asia > China > Beijing > Beijing (0.04)
- Transportation > Ground > Road (1.00)
- Information Technology > Robotics & Automation (1.00)
- Automobiles & Trucks (1.00)
Generalized Inequality-based Approach for Probabilistic WCET Estimation
Toba, Hayate, Yano, Atsushi, Azumi, Takuya
Estimating the probabilistic Worst-Case Execution Time (pWCET) is essential for ensuring the timing correctness of real-time applications, such as in robot IoT systems and autonomous driving systems. While methods based on Extreme Value Theory (EVT) can provide tight bounds, they suffer from model uncertainty due to the need to decide where the upper tail of the distribution begins. Conversely, inequality-based approaches avoid this issue but can yield pessimistic results for heavy-tailed distributions. This paper proposes a method to reduce such pessimism by incorporating saturating functions (arctangent and hyperbolic tangent) into Chebyshev's inequality, which mitigates the influence of large outliers while preserving mathematical soundness. Evaluations on synthetic and real-world data from the Autoware autonomous driving stack demonstrate that the proposed method achieves safe and tighter bounds for such distributions.
- Information Technology (0.75)
- Transportation > Ground > Road (0.55)
- Automobiles & Trucks (0.55)
NavigScene: Bridging Local Perception and Global Navigation for Beyond-Visual-Range Autonomous Driving
Peng, Qucheng, Bai, Chen, Zhang, Guoxiang, Xu, Bo, Liu, Xiaotong, Zheng, Xiaoyin, Chen, Chen, Lu, Cheng
Autonomous driving systems have made significant advances in Q&A, perception, prediction, and planning based on local visual information, yet they struggle to incorporate broader navigational context that human drivers routinely utilize. We address this critical gap between local sensor data and global navigation information by proposing NavigScene, an auxiliary navigation-guided natural language dataset that simulates a human-like driving environment within autonomous driving systems. Moreover, we develop three complementary paradigms to leverage NavigScene: (1) Navigation-guided Reasoning, which enhances vision-language models by incorporating navigation context into the prompting approach; (2) Navigation-guided Preference Optimization, a reinforcement learning method that extends Direct Preference Optimization to improve vision-language model responses by establishing preferences for navigation-relevant summarized information; and (3) Navigation-guided Vision-Language-Action model, which integrates navigation guidance and vision-language models with conventional driving models through feature fusion. Extensive experiments demonstrate that our approaches significantly improve performance across perception, prediction, planning, and question-answering tasks by enabling reasoning capabilities beyond visual range and improving generalization to diverse driving scenarios. This work represents a significant step toward more comprehensive autonomous driving systems capable of navigating complex, unfamiliar environments with greater reliability and safety.
- North America > United States > Florida > Orange County > Orlando (0.14)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.05)
- North America > United States > California > Santa Clara County > Santa Clara (0.05)
- (2 more...)
- Transportation > Ground > Road (1.00)
- Information Technology > Robotics & Automation (1.00)
- Automobiles & Trucks (1.00)
MMRHP: A Miniature Mixed-Reality HIL Platform for Auditable Closed-Loop Evaluation
Li, Mingxin, Hu, Haibo, Deng, Jinghuai, Xi, Yuchen, Chen, Xinhong, Wang, Jianping
Abstract--V alidation of autonomous driving systems requires a trade-off between test fidelity, cost, and scalability. While miniaturized hardware-in-the-loop (HIL) platforms have emerged as a promising solution, a systematic framework supporting rigorous quantitative analysis is generally lacking, limiting their value as scientific evaluation tools. T o address this challenge, we propose MMRHP, a miniature mixed-reality HIL platform that elevates miniaturized testing from functional demonstration to rigorous, reproducible quantitative analysis. The core contributions are threefold. First, we propose a systematic three-phase testing process oriented toward the Safety of the Intended Functionality (SOTIF) standard, providing actionable guidance for identifying the performance limits and triggering conditions of otherwise correctly functioning systems. Second, we design and implement a HIL platform centered around a unified spatiotemporal measurement core to support this process, ensuring consistent and traceable quantification of physical motion and system timing. Finally, we demonstrate the effectiveness of this solution through comprehensive experiments. The platform itself was first validated, achieving a spatial accuracy of 10.27 mm RMSE and a stable closed-loop latency baseline of approximately 45 ms. Subsequently, an in-depth Autoware case study leveraged this validated platform to quantify its performance baseline and identify a critical performance cliff at an injected latency of 40 ms. This work shows that a structured process, combined with a platform offering a unified spatio-temporal benchmark, enables reproducible, interpretable, and quantitative closed-loop evaluation of autonomous driving systems. Index T erms--Autonomous Driving, Hardware-in-the-Loop (HIL), Mixed Reality, CARLA, SOTIF, V alidation and V erifi-cation (V&V). HE commercial deployment of autonomous vehicles (A Vs) faces a critical bottleneck that has shifted from achieving basic functionality to delivering statistically convincing safety in long-tail scenarios [1].
- Asia > China > Hong Kong (0.05)
- Asia > China > Beijing > Beijing (0.04)
- North America > United States > Texas (0.04)
- (5 more...)
Work Zones challenge VLM Trajectory Planning: Toward Mitigation and Robust Autonomous Driving
Liao, Yifan, Sun, Zhen, Qiu, Xiaoyun, Zhao, Zixiao, Tang, Wenbing, He, Xinlei, Zheng, Xinhu, Zhang, Tianwei, Huang, Xinyi, Han, Xingshuo
Visual Language Models (VLMs), with powerful multi-modal reasoning capabilities, are gradually integrated into autonomous driving by several automobile manufacturers to enhance planning capability in challenging environments. However, the trajectory planning capability of VLMs in work zones, which often include irregular layouts, temporary traffic control, and dynamically changing geometric structures, is still unexplored. To bridge this gap, we conduct the first systematic study of VLMs for work zone trajectory planning, revealing that mainstream VLMs fail to generate correct trajectories in 68.0% of cases. To better understand these failures, we first identify candidate patterns via subgraph mining and clustering analysis, and then confirm the validity of 8 common failure patterns through human verification. Building on these findings, we propose REACT-Drive, a trajectory planning framework that integrates VLMs with Retrieval-Augmented Generation (RAG). Specifically, REACT-Drive leverages VLMs to convert prior failure cases into constraint rules and executable trajectory planning code, while RAG retrieves similar patterns in new scenarios to guide trajectory generation. Experimental results on the ROADWork dataset show that REACT -Drive yields a reduction of around 3 in average displacement error relative to VLM baselines under evaluation with Qwen2.5-VL. In addition, REACT-Drive yields the lowest inference time (0.58s) compared with other methods such as fine-tuning (17.90s).
- Europe > Italy > Lombardy > Milan (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- North America > Canada > British Columbia > Vancouver (0.04)
- (11 more...)
- Transportation > Ground > Road (1.00)
- Automobiles & Trucks (1.00)
ME$^3$-BEV: Mamba-Enhanced Deep Reinforcement Learning for End-to-End Autonomous Driving with BEV-Perception
Lu, Siyi, Liu, Run, Yang, Dongsheng, He, Lei
Autonomous driving systems face significant challenges in perceiving complex environments and making real-time decisions. Traditional modular approaches, while offering interpretability, suffer from error propagation and coordination issues, whereas end-to-end learning systems can simplify the design but face computational bottlenecks. This paper presents a novel approach to autonomous driving using deep reinforcement learning (DRL) that integrates bird's-eye view (BEV) perception for enhanced real-time decision-making. We introduce the \texttt{Mamba-BEV} model, an efficient spatio-temporal feature extraction network that combines BEV-based perception with the Mamba framework for temporal feature modeling. This integration allows the system to encode vehicle surroundings and road features in a unified coordinate system and accurately model long-range dependencies. Building on this, we propose the \texttt{ME$^3$-BEV} framework, which utilizes the \texttt{Mamba-BEV} model as a feature input for end-to-end DRL, achieving superior performance in dynamic urban driving scenarios. We further enhance the interpretability of the model by visualizing high-dimensional features through semantic segmentation, providing insight into the learned representations. Extensive experiments on the CARLA simulator demonstrate that \texttt{ME$^3$-BEV} outperforms existing models across multiple metrics, including collision rate and trajectory accuracy, offering a promising solution for real-time autonomous driving.
- Research Report > Promising Solution (0.54)
- Overview > Innovation (0.34)
- Transportation > Ground > Road (1.00)
- Automobiles & Trucks (1.00)
- Information Technology > Robotics & Automation (0.82)
- Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
- Information Technology > Architecture > Real Time Systems (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)
Interactive Adversarial Testing of Autonomous Vehicles with Adjustable Confrontation Intensity
Guo, Yicheng, Xu, Chengkai, Liu, Jiaqi, Zhang, Hao, Hang, Peng, Sun, Jian
--Scientific testing techniques are essential for ensuring the safe operation of autonomous vehicles (A Vs), with high-risk, highly interactive scenarios being a primary focus. T o address the limitations of existing testing methods, such as their heavy reliance on high-quality test data, weak interaction capabilities, and low adversarial robustness, this paper proposes ExamPPO, an interactive adversarial testing framework that enables scenario-adaptive and intensity-controllable evaluation of autonomous vehicles. The framework models the Surrounding V ehicle (SV) as an intelligent examiner, equipped with a multi-head attention-enhanced policy network, enabling context-sensitive and sustained behavioral interventions. A scalar confrontation factor is introduced to modulate the intensity of adversarial behaviors, allowing continuous, fine-grained adjustment of test difficulty. Coupled with structured evaluation metrics, ExamPPO systematically probes A V's robustness across diverse scenarios and strategies. Extensive experiments across multiple scenarios and A V strategies demonstrate that ExamPPO can effectively modulate adversarial behavior, expose decision-making weaknesses in tested A Vs, and generalize across heterogeneous environments, thereby offering a unified and reproducible solution for evaluating the safety and intelligence of autonomous decision-making systems. UTONOMOUS driving technologies have achieved substantial progress in recent years, driven by advances in perception, planning, and control systems [1], [2], [3]. These innovations have accelerated the development and deployment of intelligent vehicles in structured and semi-structured environments. This work is supported in part by the National Natural Science Foundation of China (52472451, 62433014), the Shanghai Scientific Innovation Foundation (No.23DZ1203400), and the Fundamental Research Funds for the Central Universities. Yicheng Guo, Chengkai Xu, Peng Hang, and Jian Sun are with the College of Transportation, Tongji University, Shanghai 201804, China.
- Asia > China > Shanghai > Shanghai (0.44)
- North America > United States > North Carolina > Orange County > Chapel Hill (0.04)
- Education (0.46)
- Transportation > Ground > Road (0.30)